38 - Recap Clip 7.5: Partially Observable MDPs [ID:30441]
45 von 45 angezeigt

We're at the border of going from agent designs without learning to agent designs with learning.

And yesterday we kind of culminated our study of reasoning with uncertainty or deciding

with uncertainty, which is what realistic real world agents have to do with a very brief

introduction to POMDPs.

Partially observable Markov decision procedures.

Markov decision procedures, you remember, are these environments where we have fully

observable or deterministic sensors, which makes the environment fully observable, but

non-deterministic actions.

You do something, but that can go wrong.

POMDPs add non-reliable sensors to that mix.

And that's essentially everything that can go wrong.

So in our example, our little 11 example, we added an unreliable sensor, which is basically

just introducing an error into our considerations.

And there's only one thing to remember about these things is you can do POMDPs by doing

MDP at not the level of states, but the level of belief states.

Very simple realization that has a lot of consequences that are both good and bad.

One of the good things is that we're essentially getting exactly what we need, namely the belief

state is fully observable.

We know what we believe as an agent.

We can observe our own belief state.

And so we can actually compute on that.

We can't get below the belief state because we know nothing, in principle, about the actual

state.

We can only have approximations or beliefs over the set of possible states.

So the good thing is we can, in principle, do MDP algorithms, one level up.

The bad news is that we only know MDP algorithms that can deal with discrete spaces, and the

belief space isn't.

So even though we have this wonderful realization that we don't have to learn anything new,

that turns out to be false because we didn't learn enough earlier.

If we had had the theory, the full theory of MDP in continuous spaces, which we didn't

because you need much more math foundations for that, then we could apply that.

But these AI lectures are just an introduction, and the literature is out there, so you can

actually read up on this if you're interested.

The other thing to realize is that the real world is indeed a POMDP.

Whatever we do, we don't have reliable actions.

I've ever tried to shoot a hoop in a basketball.

Typical thing, at least for me.

I plan the best.

I know exactly where the ball should go, only it doesn't.

And of course, we have, in practice, limited accuracies on all of our sensors.

Sometimes that matters, sometimes that doesn't matter.

So in a way, the conclusion from the theoretical part was that the real world is a POMDP, meaning

we have the theory, too.

We can describe things, but all algorithms are pretty terrible in their complexity.

And everything grows tremendously as our models get finer and finer.

Teil eines Kapitels:
Recaps

Zugänglich über

Offener Zugang

Dauer

00:05:16 Min

Aufnahmedatum

2021-03-30

Hochgeladen am

2021-03-31 11:06:32

Sprache

en-US

Recap: Partially Observable MDPs

Main video on the topic in chapter 7 clip 5.

Einbetten
Wordpress FAU Plugin
iFrame
Teilen